-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exp: refactor (local) executors to start using process manager #7084
Conversation
66fe6fe
to
b08cd59
Compare
a552e9f
to
d6ecf8c
Compare
- introduce ExecutorManager classes - move executor init + execution from experiments.__init__ into the manager classes - make executors serializable via ExecutorInfo
74c7a45
to
9d7dc74
Compare
def exec_queue(self, jobs: Optional[int] = 1, detach: bool = False): | ||
"""Run a single WorkspaceExecutor. | ||
|
||
Workspace execution is done within the main DVC process | ||
(rather than in multiprocessing context) | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory we could get rid of this entirely and just use WorkspaceExecutor inside the generic base multiprocessing context, but for now keeping the workspace specific behavior at least makes it easier to test exp reproduction (since we don't have to deal with testing function calls from inside multiprocessing child workers).
manager = Manager() | ||
pid_q = manager.Queue() | ||
|
||
with ProcessPoolExecutor(max_workers=jobs) as workers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should eventually go away entirely in favor of having "attached" execution done by ProcessManager
(or rather via an actual task queue that works on top of ProcessManager
).
In particular, we should not need to be handling multiprocessing worker signals/PIDs ourselves in exp executor code (it should all be abstracted away by the process manager).
9d7dc74
to
2202715
Compare
2202715
to
82dbee8
Compare
- partially refactor existing pidfile behavior into using proc manager - old pidfiles are now json-serialized executor info files placed in proc manager subdirs (alongside proc manager pidfile/logs/etc) - move multiprocessing based (attached) execution into separate execution path from future "detached" execution support
82dbee8
to
1f35473
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. π
This does not change any user-facing DVC behavior, other than exp pidfile location (which was undocumented/for internal use only)
related to: #6440
prerequisite for: #6267
Currently in the experiments API, we have a clear separation between queuing/staging an experiment (generating a stash commit with the desired changes), and execution/reproduction. The problem right now is that "execution" includes other operations that should not be part of the "execution" step. Essentially both populating the temporary workspace and calling
Repo.reproduce
must always be done as a single operation right now. This is OK in the current local execution workflow, but does not work when we get into the remote scenario - populating the workspace is a push operation that must happen on the local machine, and repro'ing the experiment in that workspace is an operation that happens on the remote machine.This refactor cleans up the exp executors API so that there is a clearer separation between operations that are not actually dependent on each other, and will allow us to serialize the executor state so that we can do different operations in separate DVC commands/processes when needed.
experiments.__init__
into themanager classes